Tag
8 articles
Learn how to build a system that processes audio and video inputs to generate code, simulating the capabilities of multimodal AI models like Qwen3.5-Omni.
Cohere has released an open-source speech recognition model that outperforms industry leader OpenAI's Whisper in benchmark tests.
Cohere AI has released Cohere Transcribe, a state-of-the-art automatic speech recognition model designed to transform audio into actionable text for enterprise use cases.
Learn how Google's new WAXAL dataset helps improve speech technology for African languages by providing training data for AI systems.
Learn how IBM's new Granite 4.0 1B Speech AI model helps computers understand and translate speech in multiple languages, even on small devices without internet access.
This explainer explores ChatGPT's Voice Mode technology, examining its multimodal architecture, real-time processing challenges, and implications for AI accessibility and reliability.
Learn to build a basic AI voice assistant that can handle phone call interactions using Python, speech recognition, and text-to-speech technologies.
Learn to implement and compare speech-to-text capabilities using Google Cloud and ElevenLabs APIs, including audio processing, transcription functions, and service evaluation.